On-demand Indexing for Large Scientific Data
نویسندگان
چکیده
Enabling search alleviates the need for manual file management, allowing users to find files by the features that are most relevant to them. Despite the need for a comprehensive file system search, most file system indexing work has focused on adapting existing solutions such as the RDBMS or spatial trees to index files. Although a step in the right direction these approaches have focused their testing on typical POSIX metadata which is fully populated, low dimensional, and primarily numeric [2, 3]. By contrast, scientific data is high dimensional, heterogeneous, and very sparse [4]. These findings indicate that approaches such as naive row-major databases, and spatial trees scale poorly to scientific data as they waste space to store null values, have inflexible schemas, and have trouble with many to one values. We propose an indexing technique called On-demand Indexing, which will create relevant indexes at search time if one does not already exist, while simultaneously providing query results. On-demand indexing will achieve scalability by only indexing searched for terms as well as utilizing a storage substrate better suited to high dimensional, heterogeneous, sparse data. Despite the simple approach of indexing everything that is searched for, we believe that like web search [1], file system search term frequency follows a power-law distribution. This means that a majority of searches will be the result of a few frequently searched terms. As a result the system would provide low latency, high precision, high recall search without the storage and computational overhead of indexing all data.
منابع مشابه
بررسی به کارگیری قوانین فهرست نویسی و بایگانی کارت اندکس بیماران بیمارستان های آموزشی دانشگاه علوم پزشکی مازندران، 1383
Background and purpose : The master patient’s index (MPÏ) card is the key to locate the patient’s record in medical records department. Üse of MPÏ in hospital information systems is important. Ân accurate MPÏ is noted in evaluation and accreditation program. Ôur study was done on MPÏ at medical records depatment of teaching hospitals in Mazandaran medical university in respect of using indexi...
متن کاملNew Developments in Biological Abstracting And Indexing
CURRENT ISSUES facing abstracting and indexing services serving scientists in biology, medicine, and agriculture bring into sharp focus the pressures which result from accelerating needs and from the impact of contemporary technology on document preparation and control. Nonetheless, the patterns of publication of scientific literature which we may describe today have been shaped during the past...
متن کاملAccelerating Queries on Very Large Datasets
In this chapter, we explore ways to answer queries on large multi-dimensional data efficiently. Given a large dataset, a user often wants to access only a relatively small number of the records. Such a selection process is typically performed through an SQL query in a database management system (DBMS). In general, the most effective technique to accelerate the query answering process is indexin...
متن کاملE2DR: Energy Efficient Data Replication in Data Grid
Abstract— Data grids are an important branch of gird computing which provide mechanisms for the management of large volumes of distributed data. Energy efficiency has recently emerged as a hot topic in large distributed systems. The development of computing systems is traditionally focused on performance improvements driven by the demand of client's applications in scientific and business domai...
متن کاملThe Hybrid Digital Tree: A New Indexing Technique for Large String Databases
There is an increasing demand for efficient indexing techniques to support queries on large string databases. In this paper, a hybrid RAM/disk-based index structure, called the Hybrid Digital tree (HD-tree), is proposed. The HD-tree keeps internal nodes in the RAM to minimize the number of disk I/Os, while maintaining leaf nodes on the disk to maximize the capability of the tree for indexing la...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013